This is meant to be a very simple introduction to working with the tracking data housed in this repo. For a more in-depth guide on working with the NFL’s tracking data, check out Mike Lopez’s notebook from the most recent Big Data Bowl.
Tracking data has been billed as “the future of sports analytics,” but it’s notoriously difficult to both acquire and use. This repo was created to help alleviate those issues; it contains tracking data from the NFL’s Next Gen Stats (NGS) Highlights for 2017-2019 seasons, as well as a few Rscripts with helper functions to make it easier to work with the data.
In this walk-through, we will:
Before getting started with the data, we need to install and load a few libraries, as well as the Rscripts containing helper functions.
# * install packages ----
install.packages("devtools", "tidyverse")
devtools::install_github('thomasp85/ggforce')
devtools::install_github('thomasp85/gganimate')
# * load packages ----
library(devtools)
library(dplyr)
library(gganimate)
library(ggforce)
library(ggplot2)
library(readr)
# * load helper functions ----
source_url("https://raw.githubusercontent.com/asonty/ngs_highlights/intro/utils/scripts/data_utils.R")
source_url("https://raw.githubusercontent.com/asonty/ngs_highlights/intro/utils/scripts/plot_utils.R")
# source("./utils/scripts/data_utils.R")
# source("./utils/scripts/plot_utils.R")
We can use the fetch_highlights_list() function to grab a list of the NGS Highlights in this repo, and by using the team_ and season_ arguments, we can filter the list down.
Lamar Jackson had some ridiculous plays during his MVP season, so let’s look at the Ravens’ highlights from 2019:
highlights <- fetch_highlights_list(team_ = "BAL", season_ = 2019)
| playKey | playDesc | team | season | week | gameId | playId |
|---|---|---|---|---|---|---|
| 237 | (13:56) (Shotgun) L.Jackson pass deep middle to W.Snead for 33 yards, TOUCHDOWN. | BAL | 2019 | 1 | 2019090803 | 1082 |
| 238 | (4:28) (Shotgun) L.Jackson pass deep middle to M.Brown for 83 yards, TOUCHDOWN. | BAL | 2019 | 1 | 2019090803 | 683 |
| 239 | (8:12) (Shotgun) L.Jackson pass deep right to M.Andrews for 27 yards, TOUCHDOWN. | BAL | 2019 | 2 | 2019091500 | 343 |
| 240 | (1:24) (Shotgun) J.Hurst reported in as eligible. L.Jackson right tackle for 8 yards, TOUCHDOWN. | BAL | 2019 | 7 | 2019102010 | 2997 |
| 241 | (5:11) (Shotgun) R.Wilson pass short right intended for J.Brown INTERCEPTED by M.Peters at BLT 33. M.Peters for 67 yards, TOUCHDOWN. | BAL | 2019 | 7 | 2019102010 | 1654 |
| 242 | (8:18) (Shotgun) L.Jackson left end for 47 yards, TOUCHDOWN. | BAL | 2019 | 10 | 2019111001 | 2257 |
| 243 | (11:49) (Shotgun) L.Jackson right end to BLT 48 for 3 yards. Lateral to R.Griffin III pushed ob at CIN 43 for 9 yards (J.Bates III). | BAL | 2019 | 10 | 2019111001 | 1000 |
| 244 | (4:12) (Shotgun) G.Edwards left guard for 63 yards, TOUCHDOWN. | BAL | 2019 | 11 | 2019111700 | 4243 |
| 245 | (9:11) (Shotgun) L.Jackson left tackle to HST 20 for 39 yards (J.Reid). HST-L.Johnson was injured during the play. | BAL | 2019 | 11 | 2019111700 | 2795 |
| 246 | (9:54) (Shotgun) L.Jackson pass short middle to W.Snead IV for 4 yards, TOUCHDOWN. Caught at goal line, crossing to right. | BAL | 2019 | 14 | 2019120801 | 3526 |
| 247 | (13:44) (Shotgun) L.Jackson pass deep left to H.Hurst for 61 yards, TOUCHDOWN \[J.Hughes\]. Flag pattern, caught at BUF 41. | BAL | 2019 | 14 | 2019120801 | 2271 |
| 248 | (1:04) (Shotgun) L.Jackson pass deep left to S.Roberts for 33 yards, TOUCHDOWN. Lamar Jackson’s 4th TD pass of game and 32nd of season. | BAL | 2019 | 15 | 2019121200 | 2839 |
| 249 | (5:14) (Shotgun) L.Jackson pass deep middle to M.Brown for 24 yards, TOUCHDOWN. | BAL | 2019 | 15 | 2019121200 | 2486 |
| 250 | (13:40) (Shotgun) L.Jackson right guard pushed ob at TEN 42 for 27 yards (W.Woodyard). | BAL | 2019 | 19 | 2020011101 | 3181 |
| 251 | (12:16) (Shotgun) L.Jackson scrambles left guard to TEN 27 for 30 yards (L.Ryan). | BAL | 2019 | 19 | 2020011101 | 2230 |
| 252 | (:18) (Shotgun) L.Jackson pass deep right to M.Brown to TEN 04 for 38 yards (A.Hooker). | BAL | 2019 | 19 | 2020011101 | 1899 |
The first column in the table (playKey) is a unique identifier for each play in the dataset, and is used by the fetch_play_data() function to grab the tracking data for a play. Let’s take a look at Lamar Jackson’s 47-yard touchdown run. The playKey for that play is 242, so we’ll provide that to fetch_play_data().
play_data <- fetch_play_data(playKey_ = 242)
| gameId | playId | playType | season | seasonType | week | preSnapHomeScore | preSnapVisitorScore | playDirection | quarter | gameClock | down | yardsToGo | yardline | yardlineSide | yardlineNumber | absoluteYardlineNumber | possessionFlag | homeTeamFlag | teamAbbr | frame | displayName | esbId | gsisId | jerseyNumber | nflId | position | positionGroup | time | x | y | s | o | dir | event | playDescription |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2019111001 | 2257 | play\_type\_rush | 2019 | REG | 10 | 10 | 28 | left | 3 | 08:18:00 | 2 | 3 | CIN 47 | CIN | 47 | 57 | 0 | 1 | CIN | 0 | Geno Atkins | ATK216644 | 00-0027720 | 97 | 496762 | DT | DL | 2019-11-10 19:46:19 | 54.74 | 29.83 | 0.01 | 80.80 | 95.59 | huddle\_start\_offense | (8:18) (Shotgun) L.Jackson left end for 47 yards, TOUCHDOWN. |
| 2019111001 | 2257 | play\_type\_rush | 2019 | REG | 10 | 10 | 28 | left | 3 | 08:18:00 | 2 | 3 | CIN 47 | CIN | 47 | 57 | 0 | 1 | CIN | 1 | Geno Atkins | ATK216644 | 00-0027720 | 97 | 496762 | DT | DL | 2019-11-10 19:46:19 | 54.74 | 29.83 | 0.01 | 80.80 | 98.23 | NA | (8:18) (Shotgun) L.Jackson left end for 47 yards, TOUCHDOWN. |
| 2019111001 | 2257 | play\_type\_rush | 2019 | REG | 10 | 10 | 28 | left | 3 | 08:18:00 | 2 | 3 | CIN 47 | CIN | 47 | 57 | 0 | 1 | CIN | 2 | Geno Atkins | ATK216644 | 00-0027720 | 97 | 496762 | DT | DL | 2019-11-10 19:46:20 | 54.74 | 29.83 | 0.01 | 81.44 | 94.54 | NA | (8:18) (Shotgun) L.Jackson left end for 47 yards, TOUCHDOWN. |
| 2019111001 | 2257 | play\_type\_rush | 2019 | REG | 10 | 10 | 28 | left | 3 | 08:18:00 | 2 | 3 | CIN 47 | CIN | 47 | 57 | 0 | 1 | CIN | 3 | Geno Atkins | ATK216644 | 00-0027720 | 97 | 496762 | DT | DL | 2019-11-10 19:46:20 | 54.74 | 29.84 | 0.01 | 81.44 | 86.27 | NA | (8:18) (Shotgun) L.Jackson left end for 47 yards, TOUCHDOWN. |
| 2019111001 | 2257 | play\_type\_rush | 2019 | REG | 10 | 10 | 28 | left | 3 | 08:18:00 | 2 | 3 | CIN 47 | CIN | 47 | 57 | 0 | 1 | CIN | 4 | Geno Atkins | ATK216644 | 00-0027720 | 97 | 496762 | DT | DL | 2019-11-10 19:46:20 | 54.74 | 29.84 | 0.01 | 81.44 | 84.09 | NA | (8:18) (Shotgun) L.Jackson left end for 47 yards, TOUCHDOWN. |
In my opinion, the most fun way to get started with tracking data is through visualizations. To that end, we can use the plot_play_frame() function to plot any given frame in a play.
It’s important to note that the tracking data contains the entire runtime of the play, including all of the dead time prior to the line being set, and in some cases even the team celebrations after a touchdown is scored. So let’s first find the ‘frame interval’ of the play:
first_frame <- play_data %>%
filter(event == "line_set") %>%
distinct(frame) %>%
slice_max(frame) %>%
pull()
final_frame <- play_data %>%
filter(event == "tackle" | event == "touchdown" | event == "out_of_bounds") %>%
distinct(frame) %>%
slice_max(frame) %>%
pull()
first_frame
## [1] 113
final_frame
## [1] 270
Now that we’ve got a better idea of the interval in which the play takes place, let’s visualize it.
plot_play_frame(play_data_ = play_data, frame_ = 180)
plot_play_frame() also has a velocities_ parameter, which, when set to TRUE, adds the players’ velocity vectors to the plot.
plot_play_frame(play_data_ = play_data, frame_ = 200, velocities_ = T)
In past Big Data Bowls, some of the top submissions borrowed a concept from soccer called “pitch control.” Pitch control models aim to quantify the areas of the field that players/teams control; an example of a basic pitch control model is Voronoi tessellation. We can use the voronoi_ argument to add a Voronoi layer to play frame plots:
plot_play_frame(play_data_ = play_data, frame_ = 220, velocities_ = F, voronoi_ = T)
The final in-built function we can use is plot_play_sequence(), which plots n_ number of frames between a first_frame_ and final_frame_ at evenly spaced intervals:
plot_play_sequence(play_data, first_frame_ = first_frame, final_frame_ = final_frame, n_=6, velocities_ = T, voronoi_ = T)
The next step in visualizing a play is animation. Rather than just animating the data as-is, let’s transform it a bit. In our animation, we’re going to highlight the fastest player on each team at every frame of the play.
First, we’ll reduce the dataset, split it up into player and ball data, and grab some details.
# * reduce dataset ----
reduced_play_data <- play_data %>% filter(frame >= first_frame, frame <= final_frame+10)
# * get play details ----
play_desc <- reduced_play_data$playDescription %>% .[1]
play_dir <- reduced_play_data$playDirection %>% .[1]
yards_togo <- reduced_play_data$yardsToGo %>% .[1]
los <- reduced_play_data$absoluteYardlineNumber %>% .[1]
togo_line <- if(play_dir=="left") los-yards_togo else los+yards_togo
# * separate player and ball tracking data ----
player_data <- reduced_play_data %>%
select(frame, homeTeamFlag, teamAbbr, displayName, gsisId, jerseyNumber, position, positionGroup,
x, y, s, o, dir, event) %>%
filter(displayName != "ball")
ball_data <- reduced_play_data %>%
select(frame, homeTeamFlag, teamAbbr, displayName, jerseyNumber, position, positionGroup,
x, y, s, o, dir, event) %>%
filter(displayName == "ball")
# * get team details ----
h_team <- reduced_play_data %>% filter(homeTeamFlag == 1) %>% distinct(teamAbbr) %>% pull()
a_team <- reduced_play_data %>% filter(homeTeamFlag == 0) %>% distinct(teamAbbr) %>% pull()
# call helper function to get team colors
team_colors <- fetch_team_colors(h_team_ = h_team, a_team_ = a_team)
h_team_color1 <- team_colors[1]
h_team_color2 <- team_colors[2]
a_team_color1 <- team_colors[3]
a_team_color2 <- team_colors[4]
Next, we’ll compute the x and y component’s of each player’s velocity. Note that the dir variable specifies the direction of the player’s movement; it is 0 degrees when the player is facing ‘up’ on the field (towards the far sideline) and increases in the clockwise direction.
# * compute velocity components ----
# velocity angle in radians
player_data$dir_rad <- player_data$dir * pi / 180
# velocity components
player_data$v_x <- sin(player_data$dir_rad) * player_data$s
player_data$v_y <- cos(player_data$dir_rad) * player_data$s
Finally, we’ll identify the fastest players on each team in every frame, and merge that information with our player_data:
# there are assuredly better ways to do this
# * identify the fastest player from each team at each frame ----
fastest_players <- player_data %>% # filter out ball-tracking data
group_by(frame, teamAbbr) %>% # group by frame and team
arrange(s) %>% top_n(s, n=1) %>% # take only the players with the highest speed on each team at every frame
mutate(isFastestFlag = 1) %>% # create new flag identifying fastest players
ungroup() %>%
select(frame, gsisId, isFastestFlag) %>% # reduce dataset to the columns needed for joining and the new flag
arrange(frame) # sort by frame
player_data <- player_data %>%
left_join(fastest_players, by = c("frame" = "frame", "gsisId" = "gsisId")) %>% # join on frame and gsisId
mutate(isFastestFlag = case_when(is.na(isFastestFlag) ~ 0, TRUE ~ 1)) # replace NA values for isFastestFlag with 0
Unfortunately, we can’t just use the plot_play_frame() function to animate a play, so we’re going to peel back the function’s innards to create our animation.
play_frames <- plot_field() + # plot_field() is a helper function that returns a ggplot2 object of an NFL field
# line of scrimmage
annotate(
"segment",
x = los, xend = los, y = 0, yend = 160/3,
colour = "#0d41e1"
) +
# 1st down marker
annotate(
"segment",
x = togo_line, xend = togo_line, y = 0, yend = 160/3,
colour = "#f9c80e"
) +
# away team velocities
geom_segment(
data = player_data %>% filter(teamAbbr == a_team),
mapping = aes(x = x, y = y, xend = x + v_x, yend = y + v_y),
colour = a_team_color1, size = 1, arrow = arrow(length = unit(0.01, "npc"))
) +
# home team velocities
geom_segment(
data = player_data %>% filter(teamAbbr == h_team),
mapping = aes(x = x, y = y, xend = x + v_x, yend = y + v_y),
colour = h_team_color1, size = 1, arrow = arrow(length = unit(0.01, "npc"))
) +
# away team locations
geom_point(
data = player_data %>% filter(teamAbbr == a_team),
mapping = aes(x = x, y = y),
fill = "#ffffff", color = a_team_color2,
shape = 21, alpha = 1, size = 6
) +
# away team jersey numbers
geom_text(
data = player_data %>% filter(teamAbbr == a_team),
mapping = aes(x = x, y = y, label = jerseyNumber),
color = a_team_color1, size = 3.5, #family = "mono"
) +
# home team locations
geom_point(
data = player_data %>% filter(teamAbbr == h_team),
mapping = aes(x = x, y = y),
fill = h_team_color1, color = h_team_color2,
shape = 21, alpha = 1, size = 6
) +
# home team jersey numbers
geom_text(
data = player_data %>% filter(teamAbbr == h_team),
mapping = aes(x = x, y = y, label = jerseyNumber),
color = h_team_color2, size = 3.5, #family = "mono"
) +
# ball location
geom_point(
data = ball_data,
mapping = aes(x = x, y = y),
fill = "#935e38", color = "#d9d9d9",
shape = 21, alpha = 1, size = 4
) +
# highlight fastest players
geom_point(
data = player_data %>% filter(isFastestFlag == 1),
mapping = aes(x = x, y = y),
colour = "#e9ff70",
alpha = 0.5, size = 8
) +
# play description and always cite your data source!
labs(
title = play_desc,
caption = "Source: NFL Next Gen Stats"
) +
# animation stuff
transition_time(frame) +
ease_aes('linear') +
NULL
# ensure timing of play matches 10 frames-per-second (h/t NFL Football Ops)
play_length <- length(unique(player_data$frame))
play_anim <- animate(
play_frames,
fps = 10,
nframe = play_length,
width = 850,
height = 500,
end_pause = 10
)